Co-Occurrence Cluster Features for Lexical Substitutions in Context
نویسنده
چکیده
This paper examines the influence of features based on clusters of co-occurrences for supervised Word Sense Disambiguation and Lexical Substitution. Cooccurrence cluster features are derived from clustering the local neighborhood of a target word in a co-occurrence graph based on a corpus in a completely unsupervised fashion. Clusters can be assigned in context and are used as features in a supervised WSD system. Experiments fitting a strong baseline system with these additional features are conducted on two datasets, showing improvements. Cooccurrence features are a simple way to mimic Topic Signatures (Martı́nez et al., 2008) without needing to construct resources manually. Further, a system is described that produces lexical substitutions in context with very high precision.
منابع مشابه
Creating a system for lexical substitutions from scratch using crowdsourcing
This article describes the creation and application of the Turk Bootstrap Word Sense Inventory for 397 frequent nouns, which is a publicly available resource for lexical substitution. This resource was acquired using Amazon Mechanical Turk. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then, more conte...
متن کاملChoosing the Word Most Typical in Context Using a Lexical Co-Occurrence Network
This paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relat...
متن کاملLexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملLexical Co-occurrence, Statistical Significance, and Word Association
Lexical co-occurrence is an important cue for detecting word associations. We present a theoretical framework for discovering statistically significant lexical co-occurrences from a given corpus. In contrast with the prevalent practice of giving weightage to unigram frequencies, we focus only on the documents containing both the terms (of a candidate bigram). We detect biases in span distributi...
متن کاملAutomated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition
This paper describes a method to improve speech recognition for non-native speech in a spoken dialogue system. Based on very general rules about possible vocalic substitutions, the frequency of occurrence of each substitution in different phonetic contexts is estimated on a small set of recordings. The most frequently observed substitutions are applied to the lexicon of the recognizer. Speakers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010